More Related Content
Similar to Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010 (20)
More from Skills Matter (20)
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
- 14. HTTP Logs Log file A: (...) FreeTouchScreenNokia5230 (...) (...) GetRidofAllSpeedCameras(...) (...) USManWinsLottery (...) (...) BNPToLaunchElectionManifesto (...) Log file B: (...) FreeTouchScreenNokia5230 (...) (...) BodyLanguageTellsAll (...)
- 23. Launching a virtual Hadoop Cluster $ elastic-mapreduce --create --name "Wiki log crunch" --alive --num-instances –instance-type c1.medium 20 Created job flow <job flow id> $ ec2din (...)
- 37. Add a step $ elastic-mapreduce --jobflow <jfid> --stream --step-name "Wiki log crunch" --input s3n://dsikar-wikilogs-2009/dec/ --output s3n://dsikar-wikilogs-output/21 --mapper s3n://dsikar-wiki-scripts/wikidictionarymap.pl --reducer s3n://dsikar-wiki-scripts/wikireduce.pl http://<instance public dns>:9100
- 38. s3cmd # make bucket $ s3cmd mb s3://dsikar-wikilogs # put log files $ s3cmd put pagecounts-200912*.gz s3://dsikar-wikilogs/dec $ s3cmd put pagecounts-201004*.gz s3://dsikar-wikilogs/apr # list log files $ s3cmd ls s3://dsikar-wikilogs/ # put scripts $ s3cmd put *.pl s3://dsikar-wiki-scripts/ # delete log files $ s3cmd del --recursive --force s3://dsikar-wikilogs/ # remove bucket $ s3cmd rb s3://dsikar-wikilogs/
Editor's Notes
- So without further ado lets get this show on the road and run a job concurrently on a few virtual machines.